Comparison with Other Implementations of Regular Expressions

ImageGear Professional DLL v18.1 for Windows
Comparison with Other Implementations of Regular Expressions
Send Feedback

ImageGear Professional v18.1 > User Guide > Appendices/General Reference > Recognition Component Specifications > Regular Expressions > Anatomy of a Regular Expression > Comparison with Other Implementations of Regular Expressions

If you already know UNIX-style regular expressions, here are the main differences between the regular expressions used in Nuance OmniPage Capture SDK and in most UNIX implementations:

In ImageGear Recognition, anchors ('^', '$') are implicit. Since we describe one field at a time, in the majority of the cases we do not need the flexibility that the pattern can match anywhere within a long text. In other words, Capture SDK always assumes a '^' at the beginning and a '$' at the end of your regular expressions. It does no harm to put them there explicitly, however. If you do want your pattern to match anywhere within a longer text, use ".*" at the beginning and end of your regular expression.
In ImageGear Recognition, regular expressions are composed of UNICODE characters rather than ASCII, ANSI, or other 8-bit coding. This way it is possible to use national characters (like 'A', 'u', 'N', etc.) freely. It is also possible to use special character classes ("\l", "\u") to express all lower or uppercase characters as defined by the current language setting within ImageGear Recognition.
There are no back-references; so you cannot use something like "(.*)\1".